105 research outputs found

    Lifting transforms on graphs and their application to video coding

    Get PDF
    Compact representations of data are very useful in many applications such as coding, denoising or feature extraction. “Classical” transforms such as Discrete Cosine Transforms (DCT) or Discrete Wavelets Transforms (DWT) provide sparse approximations of smooth signals, but lose efficiency when they are applied to signals with large discontinuities. In such cases, directional transforms, which are able to adapt their basis functions to the underlying signal structure, improve the performance of “classical” transforms. In this PhD Thesis we describe a general class of lifting transforms on graphs that can be seen as N-dimensional directional transforms. Graphs are constructed so that every node corresponds to a specific sample point of a discrete N-dimensional signal and links between nodes represent correlation between samples. Therefore, non-correlated samples (e.g., samples across a large discontinuity in the signal) should not be linked. We propose a lifting-based directional transform that can be applied to any undirected graph. In this transform, filtering operations are performed following highcorrelation directions (indicated by the links between nodes), thus avoiding filtering across large discontinuities that give rise to large high-pass coefficients in those locations. In this way, the transform efficiently exploits the correlation that exists between data on the graph, leading to a more compact representation. We mainly focus on the design and optimization of these lifting transforms on graphs, studying and discussing the three main steps required to obtain an invertible and critically sampled transform: (i) graph construction, (ii) design of “good” graph bipartitions, and (iii) filter design. We also explain how to extend the transform to J levels of decomposition, obtaining a multiresolution analysis of the original N-dimensional signal. The proposed transform has many desirable properties, such as perfect reconstruction, critically-sampled, easy generalization to N-dimensional domains, non-separable and one-dimensional filtering operations, localization in frequency and in the original domain, and the ability to choose any filtering direction. As an application, we develop a graph-based video encoder where the goal is to obtain a compact representation of the original video sequence. To this end, we first propose a graph-representation of the video sequence and then design a 3-dimensional (spatio-temporal) non-separable directional transform. This can be viewed as an extension of wavelet transform-based video encoders that operate in the spatial and in the temporal domains independently. Our transform yields better compaction ability (in terms of non-linear approximation) than a state of the art motion-compensated temporal filtering transform (which can be interpreted as a temporal wavelet transform) and a comparable hybrid Discrete Cosine Transform (DCT)-based video encoder (which is the basis of the latest video coding standards). In order to obtain a complete video encoder, the transform coefficients and the side information (needed to obtain an invertible scheme) should be entropy coded and sent to the decoder. Therefore, we also propose a coefficient-reordering method based on the information of the graph which allows to improve the compression ability of the entropy encoder. Furthermore, we design two different low-cost approaches which aim to reduce the extensive computational complexity of the proposed system without causing significant losses of compression performance. The proposed complete system leads to an efficient encoder which significantly outperforms a comparable hybrid DCT-based encoder in rate-distortion terms. Finally, we investigate how rate-distortion optimization can be applied to the proposed coding scheme.La representación compacta de señales resulta útil en diversas aplicaciones, tales como compresión, reducción de ruido, o extracción de características. Transformadas “clásicas” como la Transformada Discreta del Coseno (DCT) o la TransformadaWavelet Discreta (DWT) logran aproximaciones compactas de señales suaves, pero pierden su eficiencia al ser aplicadas sobre se˜nales que contienen grandes discontinuidades. En estos casos, las transformadas direccionales, capaces de adaptar sus funciones base a la estructura de la señal a analizar, mejoran la eficiencia de las transformadas “clásicas”. En esta tesis nos centramos en el diseño y optimización de transformadas “lifting” sobre grafos, las cuales pueden ser interpretadas como transformadas direccionales N-dimensionales. Los grafos son construidos demanera que cada nodo se corresponde con una muestra específica de una señal discreta N-dimensional, y los enlaces entre los nodos representan correlación entre muestras. Así, muestras no correlacionadas (por ejemplo, muestras que se encuentran a ambos lados de una discontinuidad) no deberían estar unidas. Sobre el grafo formado aplicaremos transformadas basadas en el esquema “lifting”, en las que las operaciones de filtrado se realizan siguiendo las direcciones indicadas por los enlaces entre nodos (direcciones de alta correlación). De esta manera, evitaremos filtrar cruzando a través de largas discontinuidades (lo que resultaría en coeficientes con alto valor en dichas discontinuidades), dando lugar a una transformada direccional que explota la correlación que existe entre las muestras de la señal en el grafo, obteniendo una representación compacta de dicha señal. En esta tesis nos centramos, principalmente, en investigar los tres principales pasos requeridos para obtener una transformada direccional basada en el esquema “lifting” aplicado en grafos: (i) la construcción del grafo, (ii) el diseño de biparticiones del grafo, y (iii) la definición de los filtros. El buen diseño de estos tres procesos determinará, entre otras cosas, la capacidad para compactar la energía de la transformada. También explicamos cómo extender este tipo de transformadas a J niveles de descomposición, obteniendo un análisis multi-resolución de la señal N-dimensional original. La transformada propuesta tiene muchas propiedades deseables, tales como reconstrucción perfecta, muestreo crítico, fácil generalización a dominios N-dimensionales, operaciones de filtrado no separables y unidimensionales, localización en frecuencia y en el dominio original, y capacidad de elegir cualquier dirección de filtrado. Como aplicación, desarrollamos un codificador de vídeo basado en grafos donde el objetivo es obtener una versión compacta de la señal de vídeo original. Para ello, primero proponemos una representación en grafos de la secuencia de vídeo y luego diseñamos transformadas no separables direccionales 3-dimensionales (espacio-tiempo). Nuestro codificador puede interpretarse como una extensión de los codificadores de vídeo basados en “wavelets”, los cuales operan independientemente (de forma separable) en el dominio espacial y en el temporal. La transformada propuesta consigue mejores resultados (en términos de aproximación no lineal) que un método del estado del arte basado en “wavelets” temporales compensadas en movimiento, y un codificador DCT comparable (base de los últimos estándares de codificación de vídeo). Para conseguir un codificador de vídeo completo, los coeficientes resultantes de la transformada y la información secundaria (necesaria para obtener un esquema invertible) deben ser codificados entrópicamente y enviados al decodificador. Por ello, también proponemos en esta tesis un método de reordenación de los coeficientes basado en la información del grafo que permite mejorar la capacidad de compresión del codificador entrópico. El esquema de codificación propuesto mejora significativamente la eficiencia de un codificador híbrido basado en DCT en términos de tasa-distorsión. Sin embargo, nuestro método tiene la desventaja de su gran complejidad computacional. Para tratar de paliar este problema, diseñamos dos algoritmos que tratan de reducir dicha complejidad sin que ello afecte en la capacidad de compresión. Finalmente, investigamos como realizar optimización tasa-distorsión sobre el codificador basado en grafos propuesto

    Standard compliant flicker reduction method with PSNR loss control

    Get PDF
    Proceedings: EEE International Conference on Acoustics, Speech and Signal Processing (ICASSP2013). Vancouver, Canada, May 26-31, 2013Flicker is a common video coding artifact that occurs especially at low and medium bit rates. In this paper we propose a temporal filter-based method to reduce flicker. The proposed method has been designed to be compliant with conventional video coding standards, i.e., to generate a bitstream that is decodable by any standard decoder implementation. The aim of the proposed method is to make the luminance changes between consecutive frames smoother on a block-by-block basis. To this end, a selective temporal low-pass filtering is proposed that smooths these luminance changes on flicker-prone blocks. Furthermore, since the low-pass filtering can incur in a noticeable blurring effect, an adaptive algorithm that allows for limiting the PSNR loss -and thus the blur-has also been designed. The proposed method has been extensively assessed on the reference software of the H.264/AVC video coding standard and compared to a state-of-the-art method. The experimental results show the effectiveness of the proposed method and prove that its performance is superior to that of the state-of-the-art method.Publicad

    Mode Decision-Based Algorithm for Complexity Control in H.264/AVC

    Get PDF
    The latest H.264/AVC video coding standard achieves high compression rates in exchange for high computational complexity. Nowadays, however, many application scenarios require the encoder to meet some complexity constraints. This paper proposes a novel complexity control method that relies on a hypothesis testing that can handle time-variant content and target complexities. Specifically, it is based on a binary hypothesis testing that decides, on a macroblock basis, whether to use a low-or a high-complexity coding model. Gaussian statistics are assumed so that the probability density functions involved in the hypothesis testing can be easily adapted. The decision threshold is also adapted according to the deviation between the actual and the target complexities. The proposed method is implemented on the H.264/AVC reference software JM10.2 and compared with a state-of-the-art method. Our experimental results prove that the proposed method achieves a better trade-off between complexity control and coding efficiency. Furthermore, it leads to a lower deviation from the target complexity.This work has been partially supported by the National Grant TEC2011-26807 of the Spanish Ministry of Science and Innovation.Publicad

    Bayesian adaptive algorithm for fast coding unit decision in the High Efficiency Video Coding (HEVC) standard

    Get PDF
    The latest High Efficiency Video Coding standard (HEVC) provides a set of new coding tools to achieve a significantly higher coding efficiency than previous standards. In this standard, the pixels are first grouped into Coding Units (CU), then Prediction Units (PU), and finally Transform Units (TU). All these coding levels are organized into a quadtree-shaped arrangement that allows highly flexible data representation; however, they involve a very high computational complexity. In this paper, we propose an effective early CU depth decision algorithm to reduce the encoder complexity. Our proposal is based on a hierarchical approach, in which a hypothesis test is designed to make a decision at every CU depth, where the algorithm either produces an early termination or decides to evaluate the subsequent depth level. Moreover, the proposed method is able to adaptively estimate the parameters that define each hypothesis test, so that it adapts its behavior to the variable contents of the video sequences. The proposed method has been extensively tested, and the experimental results show that our proposal outperforms several state-of-the-art methods, achieving a significant reduction of the computational complexity (36.5% and 38.2% average reductions in coding time for two different encoder configurations) in exchange for very slight losses in coding performance (1.7% and 0.8% average bit rate increments).This work has been partially supported by the National Grant TEC2014-53390-P of the Spanish Ministry of Economy and Competitiveness

    Standard-Compliant Low-Pass Temporal Filter to Reduce the Perceived Flicker Artifact

    Get PDF
    Flicker is a common video-compression-related temporal artifact. It occurs when co-located regions of consecutive frames are not encoded in a consistent manner, especially when Intra frames are periodically inserted at low and medium bit rates. In this paper we propose a flicker reduction method which aims to make the luminance changes between pixels in the same area of consecutive frames less noticeable. To this end, a temporal low-pass filtering is proposed that smooths these luminance changes on a block-by-block basis. The proposed method has some advantages compared to another state-of-the-art methods. It has been designed to be compliant with conventional video coding standards, i.e., to generate a bitstream that is decodable by any standard decoder implementation. The filter strength is estimated on-the-fly to limit the PSNR loss and thus the appearance of a noticeable blurring effect. The proposed method has been implemented on the H. 264/AVC reference software and thoroughly assessed in comparison to a couple of state-of-the-art methods. The flicker reduction achieved by the proposed method (calculated using an objective measurement) is notably higher than that of compared methods: 18.78% versus 5.32% and 31.96% versus 8.34%, in exchange of some slight losses in terms of coding efficiency. In terms of subjective quality, the proposed method is perceived more than two times better than the compared methods.This work has been partially supported by the National Grant TEC2011-26807 of the Spanish Ministry of Science and Innovation.Publicad

    Colección de prácticas de acústica de recintos

    Get PDF
    Grado en Ingeniería de Sistemas Audiovisuales. Asignatura: Acústica de recintosEl objetivo de la asignatura optativa Acústica de Recintos es complementar el contenido de la asignatura troncal Sistemas Electroacústicos y Sonorización, en dos aspectos básicos: El desarrollo de las teorías clásicas de propagación acústica que describen el comportamiento del campo sonoro en el interior de un recinto (estadística, geométrica y ondulatoria) y la profundización en los conceptos de acondicionamiento acústico y aislamiento acústico. Como parte del material de la asignatura Acústica de Recintos, este manual docente incluye los enunciados de tres prácticas a impartir durante la asignatura. Las dos primeras prácticas tienen el objetivo de que el alumno se familiarice con los softwares de simulación acústica en recintos cerrados. De esta forma, es posible reproducir el campo sonoro de recintos existentes o incluso obtener información sobre cómo sonarían recintos que no existen en la actualidad. El manejo de este tipo de paquetes software es muy importante para el alumno, puesto que este tipo de herramientas son básicas para desarrollar entornos de realidad acústica virtual donde se puedan realizar auralizaciones que permitan obtener una representación sonora del espacio simulado. De igual forma, este tipo de herramientas softwares son ampliamente utilizadas en la sonorización de eventos de gran envergadura. La tercera práctica incluye la medida del aislamiento a ruido aéreo entre locales como una forma de cuestionar la mayoría de requisitos que impone la normativa sobre este tipo de medidas, especialmente en lo referente a la medida de tiempo de reverberación y el muestreo del campo sonoro

    Filter optimization and complexity reduction for video coding using graph-based transforms

    Get PDF
    The basis functions of lifting transform on graphs are completely determined by finding a bipartition of the graph and defining the prediction and update filters to be used. In this work we consider the design of prediction filters that minimize the quadratic prediction error and therefore the energy of the detail coefficients, which will give rise to higher energy compaction. Then, to determine the graph bipartition, we propose a distributed maximum-cut algorithm that significantly reduces the computational cost with respect to the centralized version used in our previous work. The proposed techniques show improvements in coding performance and computational cost as compared to our previous work.This work was supported in part by NSF under grant CCF-1018977Publicad

    Optimized Update/Prediction Assignment for Lifting Transforms on Graphs

    Get PDF
    Transformations on graphs can provide compact representations of signals with many applications in denoising, feature extraction or compression. In particular, lifting transforms have the advantage of being critically sampled and invertible by construction, but the efficiency of the transform depends on the choice of a good bipartition of the graph into update (U) and prediction (P) nodes. This is the update/prediction (U=P) assignment problem, which is the focus of this paper. We analyze this problem theoretically and derive an optimal U=P assignment under assumptions about signal model and filters. Furthermore, we prove that the best U=P partition is related to the correlation between nodes on the graph and is not the one that minimizes the number of conflicts (connections between nodes of same label) or maximizes the weight of the cut. We also provide experimental results in randomly generated graph signals and real data from image and video signals that validate our theoretical conclusions, demonstrating improved performance over state of the art solutions for this problem.This work was supported in part by NSF under Grant CCF-1018977 and in part by the Spanish Ministry of Economy and Competitiveness under Grants TEC2014-53390-P, TEC2014-52289-R, TEC2016-81900-REDT/AEI and TEC2017-83838-RPublicad

    Directional Transforms for Video Coding Based on Lifting on Graphs

    Get PDF
    In this work we describe and optimize a general scheme based on lifting transforms on graphs for video coding. A graph is constructed to represent the video signal. Each pixel becomes a node in the graph and links between nodes represent similarity between them. Therefore, spatial neighbors and temporal motion-related pixels can be linked, while nonsimilar pixels (e.g., pixels across an edge) may not be. Then, a lifting-based transform, in which filterin operations are performed using linked nodes, is applied to this graph, leading to a 3-dimensional (spatio-temporal) directional transform which can be viewed as an extension of wavelet transforms for video. The design of the proposed scheme requires four main steps: (i) graph construction, (ii) graph splitting, (iii) filte design, and (iv) extension of the transform to different levels of decomposition. We focus on the optimization of these steps in order to obtain an effective transform for video coding. Furthermore, based on this scheme, we propose a coefficien reordering method and an entropy coder leading to a complete video encoder that achieves better coding performance than a motion compensated temporal filterin wavelet-based encoder and a simple encoder derived from H.264/AVC that makes use of similar tools as our proposed encoder (reference software JM15.1 configu ed to use 1 reference frame, no subpixel motion estimation, 16 × 16 inter and 4 × 4 intra modes).This work was supported in part by NSF under grant CCF-1018977 and by Spanish Ministry of Economy and Competitiveness under grants TEC2014-53390-P and TEC2014-52289-R.Publicad
    corecore